Solution to fivethirtyeight Riddler’s puzzle, Can You Find The Fish In State Names? (May 22, 2020).

Ohio is the only state whose name doesn’t share any letters with the word “mackerel.” It’s strange, but it’s true.

But that isn’t the only pairing of a state and a word you can say that about — it’s not even the only fish! Kentucky has “goldfish” to itself, Montana has “jellyfish” and Delaware has “monkfish,” just to name a few.

What is the longest “mackerel?” That is, what is the longest word that doesn’t share any letters with exactly one state? (If multiple “mackerels” are tied for being the longest, can you find them all?)

Extra credit: Which state has the most “mackerels?” That is, which state has the most words for which it is the only state without any letters in common with those words?

(For both the Riddler and the extra credit, please refer to Friend of the Riddler™ Peter Norvig’s word list.)

I took a quick look at the word list, and we can easily handle the 260K combinations (1.3M) for the 50 states.

states_orig <- c(
  "Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado",
  "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois",
  "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland",
  "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana",
  "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York",
  "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania",
  "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah",
  "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"
)

states_mod <- gsub(x = states_orig, " ", "") |> tolower()
names(states_mod) <- states_orig

words <- read.table("https://norvig.com/ngrams/word.list") |> pull(V1)
words = r.words
states_set = [ set(state) for state in r.states_mod ]
states = r.states_orig

mackerels = { 
    word: state_sel[0]
    for word in words
    # no common letters with any state? 
    if len(set(word).intersection(*states_set)) == 0 
    # finding the states that have no common letters with the word
    and len(state_sel := [state for ix, state in enumerate(states) if len(set(word).intersection(states_set[ix])) == 0]) == 1
}

Answer

library(reticulate)
mackerels <- py$mackerels

d <- enframe(mackerels, name = "word", value = "state") |> unnest()

d |>
  mutate(word_length = nchar(word)) |>
  arrange(desc(word_length))
## # A tibble: 45,385 × 3
##    word                    state       word_length
##    <chr>                   <chr>             <int>
##  1 counterproductivenesses Alabama              23
##  2 hydrochlorofluorocarbon Mississippi          23
##  3 counterproductiveness   Alabama              21
##  4 unconscientiousnesses   Alabama              21
##  5 counterconditionings    Alabama              20
##  6 deoxycorticosterones    Alabama              20
##  7 expressionlessnesses    Utah                 20
##  8 hyperconsciousnesses    Alabama              20
##  9 hypersensitivenesses    Alabama              20
## 10 incompressiblenesses    Utah                 20
## # ℹ 45,375 more rows

The answer is Alabama with the word “counterproductivenesses”.

Extra credit

d |> count(state, sort = T)
## # A tibble: 32 × 2
##    state           n
##    <chr>       <int>
##  1 Ohio        11342
##  2 Alabama      8274
##  3 Utah         6619
##  4 Mississippi  4863
##  5 Hawaii       1763
##  6 Kentucky     1580
##  7 Wyoming      1364
##  8 Tennessee    1339
##  9 Alaska       1261
## 10 Nevada       1229
## # ℹ 22 more rows

The answer is Ohio!

Plot for comparison:

d |>
  mutate(word_length = nchar(word)) |>
  arrange(desc(word_length)) |>
  count(state) |>
  mutate(
    state = fct_lump_n(state, n = 10, w = n),
    state = fct_reorder(state, n, .desc = TRUE)
  ) |>
  ggplot(aes(state, n)) +
  geom_col() +
  labs(title = "# Mackerels by State", x = "State", y = "Count")